16 - Deep Learning - Plain Version 2020 [ID:21111]

50 von 75 angezeigt

Welcome back to deep learning. So today I want to talk to you about the actual pooling implementation.

The pooling layers are one essential step in many deep networks. The main idea behind this

is that you want to reduce the dimensionality across the spatial domain. So here we see this

small example where we summarize the information in the green rectangles, the blue rectangles,

the yellow and the red ones to only one value. So we have a two by two input that has to be mapped

to a single value. Now this of course reduces the number of parameters. It introduces a hierarchy

and allows you to work with spatial abstraction. Furthermore it reduces computational cost and

overfitting. We need some basic assumptions of course. Here and of course one of the assumptions

is that the features are hierarchically structured. By pooling we are reducing the output size

and introduce this hierarchy that should be intrinsically present in the signal.

We talked about the eyes being composed of edges and lines and faces a composition of eyes and mouth.

This has to be present in order to make pooling a sensible operation to be included in your network.

Here you see a pooling layer of three by three and we choose max pooling. So in max pooling only the

highest number of a receptor field will actually be propagated into the output. Obviously we can

also work with larger strides. Typically the stride equals the neighborhood size such that we can get

one output per receptive field. The problem here is of course that the maximum operation

adds an additional non-linearity and therefore we also have to think about how to resolve this step

in the gradient procedure. Essentially we use again the concept of the subgradient where we simply

propagate into the cell that has produced the maximum output. So you could say the winner takes

it all. Now an alternative to this is average pooling. Here we compute simply the average of

in the neighborhood. However it does not consistently perform better than max pooling. In the back

propagation paths the error is simply shared in equal parts and back propagated to the respective

units. There are many more pooling strategies like fractional max pooling, LP pooling, stochastic

pooling, spatial pyramid pooling, generalized pooling and many more. There's a whole different

set of strategies about this. Two alternatives that we already talked about are the strided and

atrocious convolutions. This became really popular because then you don't have to encode the max

pooling as an additional step and you reduce the number of parameters. Typically people now use

strided convolutions with s greater than one in order to implement convolution and pooling at the

same time. So let's recap what our convolutional neural networks are doing. We talked about the

convolution producing feature maps and the pooling reducing the size of the respective feature maps.

Then again convolutions and pooling until we end up at an abstract representation. Finally we had

these fully connected layers in order to do the classification. Actually we can kick out this last

block because we've seen that if we replace this with a reformatting into channel direction then

we can replace it with a one by one convolution. Subsequently we just apply this to get our final

classification. Hence we can reduce the number of building blocks further. We don't even need

fully connected layers anymore. Now everything then becomes fully convolutional and we can express

essentially the entire chain of operations by convolutions and pooling steps. So we don't even

need fully connected layers anymore. The nice thing about the one by one convolutions is if you

combine this with something that is called global average pooling then you can essentially also

process input images of arbitrary size. So the idea here is then at the end of the convolutional

processing you simply map into the channel direction and compute the global average for

all of your inputs. This works because you have a predefined global pooling operation.

Then you can make this applicable to images of arbitrary sizes. So again we benefit from the

ideas of pooling and convolution. An interesting concept that we will also have a look at in more

detail later in this lecture is the inception model. This approach is from the paper going deeper

with convolutions reference 8. Following our self-stated motto we need to go deeper.

This network won the ImageNet challenge 2014. An example is GoogleNet as one incarnation which

is inspired by reference number 4. The idea that they presented tackles the problem of having to

fix the steps of convolution and pooling in alteration. Why not allow the network to learn

on its own when it wants to pool and when it wants to convolve. The idea is that the network

Teil einer Videoserie :

Deep Learning - Plain Version

Presenters

Prof. Dr.-Ing. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:08:45 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 13:56:29

Sprache

en-US

Deep Learning - Activations, Convolutions, and Pooling Part 4

This video presents max and average pooling, introduces the concept of fully convolutional networks, and hints on how this is used to build deep networks.

For reminders to watch the new video follow on Twitter or LinkedIn.

References:
[1] I. J. Goodfellow, D. Warde-Farley, M. Mirza, et al. “Maxout Networks”. In: ArXiv e-prints (Feb. 2013). arXiv: 1302.4389 [stat.ML].
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”. In: CoRR abs/1502.01852 (2015). arXiv: 1502.01852.
[3] Günter Klambauer, Thomas Unterthiner, Andreas Mayr, et al. “Self-Normalizing Neural Networks”. In: Advances in Neural Information Processing Systems (NIPS). Vol. abs/1706.02515. 2017. arXiv: 1706.02515.
[4] Min Lin, Qiang Chen, and Shuicheng Yan. “Network In Network”. In: CoRR abs/1312.4400 (2013). arXiv: 1312.4400.
[5] Andrew L. Maas, Awni Y. Hannun, and Andrew Y. Ng. “Rectifier Nonlinearities Improve Neural Network Acoustic Models”. In: Proc. ICML. Vol. 30. 1. 2013.
[6] Prajit Ramachandran, Barret Zoph, and Quoc V. Le. “Searching for Activation Functions”. In: CoRR abs/1710.05941 (2017). arXiv: 1710.05941.
[7] Stefan Elfwing, Eiji Uchibe, and Kenji Doya. “Sigmoid-weighted linear units for neural network function approximation in reinforcement learning”. In: arXiv preprint arXiv:1702.03118 (2017).
[8] Christian Szegedy, Wei Liu, Yangqing Jia, et al. “Going Deeper with Convolutions”. In: CoRR abs/1409.4842 (2014). arXiv: 1409.4842.

Further Reading:
A gentle Introduction to Deep Learning

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/21111

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/21111&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Per RSS abonnieren